Skip to content

Conversation

sjmonson
Copy link
Collaborator

@sjmonson sjmonson commented Oct 9, 2025

Summary

Makes the max_tokens request key configurable through an environment variable per endpoint type. Defaults to max_tokens for legacy completions and max_completion_tokens for chat/completions

Details

  • Add the GUIDELLM__OPENAI__MAX_OUTPUT_KEY config option which is a dict mapping from route name -> output tokens key. Default is {"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}

Test Plan

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Tyler Michael Smith <[email protected]>
@sjmonson sjmonson changed the title Fix/drop max completion tokens Configurable max_tokens/max_completion_tokens key Oct 9, 2025
@sjmonson sjmonson force-pushed the fix/drop_max_completion_tokens branch from 68e69bc to ef981fd Compare October 9, 2025 19:30
@sjmonson sjmonson requested review from markurtz and Copilot October 9, 2025 20:01
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements configurable request keys for output token limits in OpenAI API calls. Instead of hardcoding both max_tokens and max_completion_tokens in all requests, the system now uses the appropriate key based on endpoint type through a new environment variable configuration.

  • Adds GUIDELLM__OPENAI__MAX_OUTPUT_KEY configuration mapping endpoint types to their respective output token keys
  • Updates payload generation to use the configured key instead of setting both keys
  • Fixes test assertions to match the new single-key approach

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/guidellm/config.py Adds new max_output_key configuration with defaults for text and chat completions
src/guidellm/backend/openai.py Updates payload generation to use configurable key and adds type definitions
tests/unit/conftest.py Removes duplicate token limit assertions and fixes mock response generation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@sjmonson sjmonson mentioned this pull request Oct 9, 2025
4 tasks
Copy link
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to wait for Mark's review, but looks good to me.

@sjmonson sjmonson merged commit 121dcdc into main Oct 10, 2025
17 checks passed
@sjmonson sjmonson deleted the fix/drop_max_completion_tokens branch October 10, 2025 13:36
sjmonson added a commit that referenced this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guidellm adds unexpected field to requests
4 participants